Fuzzy Term Proximity With Boolean Queries at 2006 TREC Terabyte Task

نویسندگان

  • Annabelle Mercier
  • Michel Beigbeder
چکیده

We report here the results of fuzzy term proximity method applied to Terabyte Task. Fuzzy proxmity main feature is based on the idea that the closer the query terms are in a document, the more relevant this document is. With this principle, we have a high precision method so we complete by these obtained with Zettair search engine default method (dirichlet). Our model is able to deal with Boolean queries, but contrary to the traditional extensions of the basic Boolean IR model, it does not explicitly use a proximity operator because it can not be generalized to nodes. The fuzzy term proximity is controlled with an influence function. Given a query term and a document, the influence function associates to each position in the text a value dependant of the distance of the nearest occurence of this query term. To model proximity, this function is decreasing with distance. Different forms of function can be used: triangular, gaussian etc. For practical reasons only functions with finite support were used. The support of the function is limited by a constant called k. The fuzzy term proximity functions are associated to every leaves of the query tree. Then fuzzy proximities are computed for every nodes with a post-order tree traversal. Given the fuzzy proximities of the sons of a node, its fuzzy proximity is computed, like in the fuzzy IR models, with a mimimum (resp. maximum) combination for conjunctives (resp. disjunctives) nodes. Finally, a fuzzy query proximity value is obtained for each position in this document at the root of the query tree. The score of this document is the integration of the function obtained at the tree root. For the experiments, we modify Lucy (version 0.5.2) to implement our matching function. Two query sets are used for our runs. One set is manually built with the title words (and sometimes some description words). Each of these words is OR’ed with its derivatives like plurals for instance. Then the OR nodes obtained are AND’ed at the tree root. An other automatic query sets is built with an AND of automatically extracted terms from the title field. These two query sets are submitted to our system with two values of k: 50 and 200. The two corresponding query sets with flat queries are also submitted to zettair search engine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Peking University at the TREC 2006 Terabyte Track

This paper details the experiments carried out at TREC 2006 Terabyte Track using Indri Search Engine. There were three tasks in the Terabyte track of TREC 2006, i.e. efficiency task, ad hoc task and named page finding task. We participated in two tasks, and submitted 5 runs for ad hoc task and 3 runs for named page task respectively. In ad hoc task, we looked at the importance of term proximity...

متن کامل

Juru at TREC 2006: TAAT versus DAAT in the Terabyte Track

Our experiments focused this year on the ad-hock task of the Terabyte track. We experimented with WAND, a document-ata-time evaluation algorithm we developed recently. Our results demonstrate the superiority of WAND over traditional term-atime strategy while searching over a large collection such as gov2. We demonstrate how Web expansion can be successfully applied to significantly improve sear...

متن کامل

Indri at TREC 2004: Terabyte Track

This paper provides an overview of experiments carried out at the TREC 2004 Terabyte Track using the Indri search engine. Indri is an efficient, effective distributed search engine. Like INQUERY, it is based on the inference network framework and supports structured queries, but unlike INQUERY, it uses language modeling probabilities within the network which allows for added flexibility. We des...

متن کامل

Fuzzy Proximity Ranking with Boolean Queries

Based on the idea that the closer the query terms are in a document, the more relevant this document is, we experiment an IR method based on a fuzzy proximity degree of the query term occurences in a document to compute its relevance to the query. Our model is able to deal with Boolean queries, but contrary to the traditional extensions of the basic Boolean IR model, it does not explicitly use ...

متن کامل

Experiments with the Negotiated Boolean Queries of the TREC 2006 Legal Discovery Track

We analyze the results of several experimental runs submitted to the Legal Discovery and Terabyte Tracks of TREC 2006. In the Legal Track, the final negotiated boolean queries produced higher mean scores in average precision and Precision@10 than a corresponding vector run of the same query terms, but the vector run usually recalled more relevant items by rank 5000, and on average the boolean q...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006